diffusion policy
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Information Technology > Security & Privacy (0.95)
- Government (0.70)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Diffusion-based ReinforcementLearningvia Q-weightedVariationalPolicyOptimization
UnlikeGaussian policies, the log-likelihood indiffusion policies isinaccessible; thus this entropy term is nontrivial. Moreover, to reduce the large variance of diffusion policies, we also develop an efficient behavior policy through action selection. This can further improve its sample efficiency during online interaction.
Diffusion Policies Creating a Trust Region for Offline Reinforcement Learning
Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning
Diffusion policy has shown a strong ability to express complex action distributions in offline reinforcement learning (RL). However, it suffers from overestimating Q-value functions on out-of-distribution (OOD) data points due to the offline dataset limitation. To address it, this paper proposes a novel entropy-regularized diffusion policy and takes into account the confidence of the Q-value prediction with Q-ensembles. At the core of our diffusion policy is a mean-reverting stochastic differential equation (SDE) that transfers the action distribution into a standard Gaussian form and then samples actions conditioned on the environment state with a corresponding reverse-time process. We show that the entropy of such a policy is tractable and that can be used to increase the exploration of OOD samples in offline RL training. Moreover, we propose using the lower confidence bound of Q-ensembles for pessimistic Q-value function estimation. The proposed approach demonstrates state-of-the-art performance across a range of tasks in the D4RL benchmarks, significantly improving upon existing diffusion-based policies.